AITopics | context list

Collaborating Authors

context list

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree

Andrusenko, Andrei, Bataev, Vladimir, Grigoryan, Lilit, Lavrukhin, Vitaly, Ginsburg, Boris

arXiv.org Artificial IntelligenceAug-13-2025

--Recognizing specific key phrases is an essential task for contextualized Automatic Speech Recognition (ASR). However, most existing context-biasing approaches have limitations associated with the necessity of additional model training, significantly slow down the decoding process, or constrain the choice of the ASR system type. This paper proposes a universal ASR context-biasing framework that supports all major types: CTC, Transducers, and Attention Encoder-Decoder models. The framework is based on a GPU-accelerated word boosting tree, which enables it to be used in shallow fusion mode for greedy and beam search decoding without noticeable speed degradation, even with a vast number of key phrases (up to 20K items). The obtained results showed high efficiency of the proposed method, surpassing the considered open-source context-biasing approaches in accuracy and decoding speed. Our context-biasing framework is open-sourced as a part of the NeMo toolkit. Modern end-to-end automatic speech recognition (ASR) systems, such as Connectionist Temporal Classification (CTC) [1], Recurrent Neural Transducer (RNN-T) [2], and Attention Encoder-Decoder (AED) [3], already achieve relatively high speech recognition accuracy in common data domains [4].

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.07014

Genre: Research Report (0.90)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition

Wang, Yi-Cheng, Pai, Li-Ting, Yan, Bi-Cheng, Wang, Hsin-Wei, Lin, Chi-Han, Chen, Berlin

arXiv.org Artificial IntelligenceSep-10-2024

End-to-end (E2E) automatic speech recognition (ASR) models have become standard practice for various commercial applications. However, in real-world scenarios, the long-tailed nature of word distribution often leads E2E ASR models to perform well on common words but fall short in recognizing uncommon ones. Recently, the notion of a contextual adapter (CA) was proposed to infuse external knowledge represented by a context word list into E2E ASR models. Although CA can improve recognition performance on rare words, two crucial data imbalance problems remain. First, when using low-frequency words as context words during training, since these words rarely occur in the utterance, CA becomes prone to overfit on attending to the token due to higher-frequency words not being present in the context list. Second, the long-tailed distribution within the context list itself still causes the model to perform poorly on low-frequency context words. In light of this, we explore in-depth the impact of altering the context list to have words with different frequency distributions on model performance, and meanwhile extend CA with a simple yet effective context-balanced learning objective. A series of experiments conducted on the AISHELL-1 benchmark dataset suggests that using all vocabulary words from the training corpus as the context list and pairing them with our balanced objective yields the best performance, demonstrating a significant reduction in character error rate (CER) by up to 1.21% and a more pronounced 9.44% reduction in the error rate of zero-shot words.

adapter, context list, context word, (16 more...)

arXiv.org Artificial Intelligence

2409.06468

Country: Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

Bleeker, Maurits, Swietojanski, Pawel, Braun, Stefan, Zhuang, Xiaodan

arXiv.org Artificial IntelligenceAug-16-2023

This paper presents an extension to train end-to-end Context-Aware Transformer Transducer ( CATT ) models by using a simple, yet efficient method of mining hard negative phrases from the latent space of the context encoder. During training, given a reference query, we mine a number of similar phrases using approximate nearest neighbour search. These sampled phrases are then used as negative examples in the context list alongside random and ground truth contextual information. By including approximate nearest neighbour phrases (ANN-P) in the context list, we encourage the learned representation to disambiguate between similar, but not identical, biasing phrases. This improves biasing accuracy when there are several similar phrases in the biasing inventory. We carry out experiments in a large-scale data regime obtaining up to 7% relative word error rate reductions for the contextual portion of test data. We also extend and evaluate CATT approach in streaming applications.

artificial intelligence, inductive learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.08862

Country:

North America > United States > New York (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems

Wang, Xiaoqiang, Liu, Yanqing, Zhao, Sheng, Li, Jinyu

arXiv.org Artificial IntelligenceAug-17-2021

It's challenging to customize transducer-based automatic In this work, we propose a novel contextual biasing method speech recognition (ASR) system with context information which leverages contextual information by adding a contextual which is dynamic and unavailable during model training. In spelling correction (CSC) model on top of the transducer this work, we introduce a light-weight contextual spelling correction model. To consider contextual information during correction, model to correct context-related recognition errors in a context encoder which encodes context phrases into hidden transducer-based ASR systems. We incorporate the context information embeddings is added to the spelling correction model [16, 17], into the spelling correction model with a shared context the decoder of the correction model then attends to the context encoder and use a filtering algorithm to handle large-size encoder and text encoder by attention mechanism [18].

context phrase, contextual, speech recognition, (14 more...)

arXiv.org Artificial Intelligence

2108.07493

Country:

North America > United States (0.14)
Asia > China (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback